lemma 19
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- Asia > Middle East > Jordan (0.04)
Contents of Appendix A Extended Literature Review 14 B Time Uniform Lasso Analysis 15 C Results on Exploration 18 C.1 ALE
Table 2 compares recent work on sparse linear bandits based on a number of important factors. Some of the mentioned bounds depend on problem-dependent parameters (e.g. Carpentier and Munos [ 2012 ] assume that the action set is a Euclidean ball, and that the noise is directly added to the parameter vector, i.e. In this setting, Carpentier and Munos [ 2012 ] present a O ( d p n) regret bound. Li et al. [ 2022 ] require a stronger condition This is generally not true, but may hold with high probability.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- Oceania > Australia > New South Wales (0.04)
- (2 more...)
Contents of Appendix A Extended Literature Review 14 B Time Uniform Lasso Analysis 15 C Results on Exploration 18 C.1 ALE
Table 2 compares recent work on sparse linear bandits based on a number of important factors. Some of the mentioned bounds depend on problem-dependent parameters (e.g. Carpentier and Munos [ 2012 ] assume that the action set is a Euclidean ball, and that the noise is directly added to the parameter vector, i.e. In this setting, Carpentier and Munos [ 2012 ] present a O ( d p n) regret bound. Li et al. [ 2022 ] require a stronger condition This is generally not true, but may hold with high probability.
- Europe > Austria > Vienna (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Minimum width for universal approximation using squashable activation functions
Shin, Jonghyun, Kim, Namjun, Hwang, Geonho, Park, Sejun
The exact minimum width that allows for universal approximation of unbounded-depth networks is known only for ReLU and its variants. In this work, we study the minimum width of networks using general activation functions. Specifically, we focus on squashable functions that can approximate the identity function and binary step function by alternatively composing with affine transformations. We show that for networks using a squashable activation function to universally approximate $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$, the minimum width is $\max\{d_x,d_y,2\}$ unless $d_x=d_y=1$; the same bound holds for $d_x=d_y=1$ if the activation function is monotone. We then provide sufficient conditions for squashability and show that all non-affine analytic functions and a class of piecewise functions are squashable, i.e., our minimum width result holds for those general classes of activation functions.
A Generalized Version of Chung's Lemma and its Applications
Jiang, Li, Li, Xiao, Milzarek, Andre, Qiu, Junwen
Chung's lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial diminishing step sizes. In this work, we develop a generalized version of Chung's lemma, which provides a simple non-asymptotic convergence framework for a more general family of step size rules. We demonstrate broad applicability of the proposed generalized Chung's lemma by deriving tight non-asymptotic convergence rates for a large variety of stochastic methods. In particular, we obtain partially new non-asymptotic complexity results for stochastic optimization methods, such as stochastic gradient descent and random reshuffling, under a general $(\theta,\mu)$-Polyak-Lojasiewicz (PL) condition and for various step sizes strategies, including polynomial, constant, exponential, and cosine step sizes rules. Notably, as a by-product of our analysis, we observe that exponential step sizes can adapt to the objective function's geometry, achieving the optimal convergence rate without requiring exact knowledge of the underlying landscape. Our results demonstrate that the developed variant of Chung's lemma offers a versatile, systematic, and streamlined approach to establish non-asymptotic convergence rates under general step size rules.
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > New York (0.04)
- Europe > Russia (0.04)
- (3 more...)
Entrywise Inference for Causal Panel Data: A Simple and Instance-Optimal Approach
Yan, Yuling, Wainwright, Martin J.
In causal inference with panel data under staggered adoption, the goal is to estimate and derive confidence intervals for potential outcomes and treatment effects. We propose a computationally efficient procedure, involving only simple matrix algebra and singular value decomposition. We derive non-asymptotic bounds on the entrywise error, establishing its proximity to a suitably scaled Gaussian variable. Despite its simplicity, our procedure turns out to be instance-optimal, in that our theoretical scaling matches a local instance-wise lower bound derived via a Bayesian Cram\'{e}r-Rao argument. Using our insights, we develop a data-driven procedure for constructing entrywise confidence intervals with pre-specified coverage guarantees. Our analysis is based on a general inferential toolbox for the SVD algorithm applied to the matrix denoising model, which might be of independent interest.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Spain > Basque Country (0.04)
The Variational Method of Moments
Bennett, Andrew, Kallus, Nathan
The conditional moment problem is a powerful formulation for describing structural causal parameters in terms of observables, a prominent example being instrumental variable regression. A standard approach is to reduce the problem to a finite set of marginal moment conditions and apply the optimally weighted generalized method of moments (OWGMM), but this requires we know a finite set of identifying moments, can still be inefficient even if identifying, or can be unwieldy and impractical if we use a growing sieve of moments. Motivated by a variational minimax reformulation of OWGMM, we define a very general class of estimators for the conditional moment problem, which we term the variational method of moments (VMM) and which naturally enables controlling infinitely-many moments. We provide a detailed theoretical analysis of multiple VMM estimators, including based on kernel methods and neural networks, and provide appropriate conditions under which these estimators are consistent, asymptotically normal, and semiparametrically efficient in the full conditional moment model. This is in contrast to other recently proposed methods for solving conditional moment problems based on adversarial machine learning, which do not incorporate optimal weighting, do not establish asymptotic normality, and are not semiparametrically efficient.
Ptarithmetic
The present article introduces ptarithmetic (short for "polynomial time arithmetic") -- a formal number theory similar to the well known Peano arithmetic, but based on the recently born computability logic (see http://www.cis.upenn.edu/~giorgi/cl.html) instead of classical logic. The formulas of ptarithmetic represent interactive computational problems rather than just true/false statements, and their "truth" is understood as existence of a polynomial time solution. The system of ptarithmetic elaborated in this article is shown to be sound and complete. Sound in the sense that every theorem T of the system represents an interactive number-theoretic computational problem with a polynomial time solution and, furthermore, such a solution can be effectively extracted from a proof of T. And complete in the sense that every interactive number-theoretic problem with a polynomial time solution is represented by some theorem T of the system. The paper is self-contained, and can be read without any previous familiarity with computability logic.